Latency vs Throughput
๐ Latencyโ
Definition: The time taken to process a single request or operation, from initiation to completion.
- Unit: Measured in milliseconds (ms) or seconds
- Analogy: The time it takes a single car to travel from point A to point B
Examples in System Designโ
- Time taken for an API call to return a response
- Time for a database to fetch one record
๐ Low latency means fast response times. It's critical for user-facing systems like search engines, online games, trading systems, etc.
๐ Throughputโ
Definition: The number of requests a system can handle per unit of time.
- Unit: Measured in requests per second (RPS), queries per second (QPS), or transactions per second (TPS)
- Analogy: The number of cars passing through a highway per minute
Examples in System Designโ
- A web server handling 10,000 requests/sec
- Kafka processing 1 million messages/sec
๐ High throughput means the system can handle more load.
โ๏ธ Relationship Between Latency and Throughputโ
They are related but not the same:
Different Combinationsโ
- Low latency but low throughput: A system that responds quickly but can't handle many users at once
- High throughput but high latency: A batch processing system that processes thousands of records at once but takes minutes per request
Trade-offsโ
Improving one often comes at the cost of the other:
- Adding parallelism can improve throughput but may increase latency due to coordination overhead
- Optimizing for low latency (e.g., caching, indexing) may reduce maximum throughput
๐ Real-World Example: Ride-sharing App (Uber, Ola)โ
| Metric | Description |
|---|---|
| Latency | Time taken to show nearby drivers after you open the app |
| Throughput | Number of ride requests the system can handle globally per second |
๐ Optimization Strategiesโ
To Reduce Latencyโ
- Use caching (Redis, CDN)
- Optimize database queries (indexes, denormalization)
- Reduce network hops
To Increase Throughputโ
- Horizontal scaling (more servers)
- Message queues (Kafka, RabbitMQ)
- Batch processing
๐ก Key Takeawaysโ
Latency = speed of one request
Throughput = how many requests per unit time
Understanding both metrics is crucial for designing scalable systems that meet user expectations and business requirements.